Russian Monitor Corpora: Composition, Linguistic Encoding and Internet Publication
نویسنده
چکیده
5XVVLDQ PRQLWRU FRUSRUD VHHNV WR UHIOHFW WKH FXUUHQW VWDWXV RI 5XVVLDQ DQG FRQWDLQV WRGD\ PLOOLRQ ZRUGV DQG ZLOO EH QHYHU FRPSOHWH EHFDXVH OLNH ODQJXDJH LWVHOI LW LV DOZD\V GHYHORSLQJ 6RPH QHZ H[DPSOHV RI ODQJXDJH DUH EHLQJ DGGHG ZKLOH RWKHU WH[WV DUH GHOHWHG WR HQVXUH WKDW WKH FRUSXV UHSUHVHQWV WKH FXUUHQW VWDWH RI D ODQJXDJH 3URJUHVV LQ 5XVVLDQ ODQJXDJH SURFHVVLQJ DIIRUGV DQ RSSRUWXQLW\ IRU DSSO\LQJ LWV UHVXOWV IRU FUHDWLQJ 5XVVLDQ PRQLWRU FRUSRUD VWURQJO\ FRQQHFWHG ZLWK WKH VHW RI HOHFWURQLF GLFWLRQDULHV E\ WKH KHOS RI OLQJXLVWLF VRIWZDUH 2XU DSSURDFK LV SDUWLFXODUO\ GHSHQGHQW RQ PRQLWRULQJ RI 5XVVLDQ UHVRXUFHV SXEOLVKHG LQ ,QWHUQHW DQG RQ &' ODQJXDJH SURFHVVRU 5XVVLFRQ DQG ZLGH XVDJH RI 5XVVLFRQ HOHFWURQLF GLFWLRQDULHV 3LORW FRUSXV TXHU\ V\VWHP IRU -DYD LQ LWV ,QWHUQHW YHUVLRQ DOORZV • WR XVH D VHOHFWHG VXEFRUSXV RU VXEFRUSRUD RU WKH ZKROH FRUSXV • WR VHDUFK D ZRUG LQ LWV SDUWLFXODU IRUP RU D ZKROH SDUDGLJP • WR FKDQJH WKH OHQJWK RI WKH FRQWH[W IURP RQH OLQH E\ GHIDXOW WR PRUH OLQHV
منابع مشابه
Integration of Russian Language Resources
In this paper we describe the creation of large scale linguistic resources for Russian language. Internet/intranet system architecture was developed to make a large volume of Russian language lexical information, corpora (texts) and knowledge base (Russian WordNet) available to the system at development and/or run time. There are four linguistic counterparts, corresponding to the major categori...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کاملEncoding Linguistic Corpora
This paper describes the motivation and design of the Corpus Encoding Standard (CES) (Ide, et al., (1996); Ide, 1998), an encoding standard for linguistic corpora intended to meet the need for the development of standardized encoding practices for linguistic corpora. The CES identifies a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive repre...
متن کاملCollection, Annotation and Analysis of Gold Standard Corpora for Knowledge-Rich Context Extraction in Russian and German
This paper describes the collection, annotation and linguistic analysis of a gold standard for knowledge-rich context extraction on the basis of Russian and German web corpora as part of ongoing PhD thesis work. In the following sections, the concept of knowledge-rich contexts is refined and gold standard creation is described. Linguistic analyses of the gold standard data and their results are...
متن کاملDesign and Data Collection for the Accentological Corpus of the Russian Language
Accentological corpus provides a researcher an opportunity to study word stress and stress variation, which are very important for the Russian language. Moreover, Accentological corpus allows studying the history of the Russian language stress development. The research presents the main characteristics of Accentological corpus available at ruscorpora.ru. Corpora size, type and sources of text m...
متن کامل